[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows#192573
Conversation
3c4683b to
a544490
Compare
| kibanaBaseUrl: this.kibanaBaseUrl, | ||
| logger, | ||
| maintenanceWindowsService: new MaintenanceWindowsService({ | ||
| cacheInterval: this.config.rulesSettings.cacheInterval, |
There was a problem hiding this comment.
this is used just for reducing the cache interval for testing so I reused the setting from the rules settings service.
| }); | ||
|
|
||
| // Only look at maintenance windows for this rule category | ||
| const maintenanceWindows = activeMaintenanceWindows.filter(({ categoryIds }) => { |
There was a problem hiding this comment.
active maintenance windows are cached but whether those active maintenance windows apply to this rule type based on category are checked every time this function is called.
| const executeEvents = events.filter((event) => event?.event?.action === 'execute'); | ||
|
|
||
| // the first execute event should not have any maintenance window ids because there were no alerts during the | ||
| // first execution |
There was a problem hiding this comment.
this is a change from before where maintenance windows were loaded and set in the event log even if there were no alerts during the rule execution, so I believe this is more correct than before.
|
Pinging @elastic/response-ops (Team:ResponseOps) |
|
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
|
@elasticmachine merge upstream |
|
@elasticmachine merge upstream |
pmuellr
left a comment
There was a problem hiding this comment.
code LGTM
Noted what is either a bug or me being confused in getActiveMaintenanceWindows(), which seems like it would never return any windows since it's searching on a unique date generated in that method.
| } | ||
| } | ||
|
|
||
| public async loadMaintenanceWindows( |
There was a problem hiding this comment.
I debugged the code to make sure we weren't somehow going down here in the maintance window UX itself, because then we'd have a (potential) problem - the cached one could be stale.
It doesn't.
However, perhaps we should work in "cache" on the method name here, somehow, in case someone uses this and doesn't realize it could be stale. loadCacheableMaintenanceWindows() (yikes!)
Actually, maybe easier to swap names w/ loadMaintenanceWindows() and the private getMaintenaceWindows().
There was a problem hiding this comment.
Switched loadMaintenanceWindows and getMaintenanceWindows in 883d909
|
|
||
| public processAlerts(opts: ProcessAlertsOpts) { | ||
| this.legacyAlertsClient.processAlerts(opts); | ||
| public async processAlerts(opts: ProcessAlertsOpts) { |
There was a problem hiding this comment.
complete aside, but if the occaisonal event loop delays we seen in rules are caused by processAlerts(), making this async may help - chunks things up a bit, give other node events get a chance to run
| const startDateWithCacheOffset = new Date(startDate.getTime() + cacheIntervalMs); | ||
| const startDateWithCacheOffsetISO = startDateWithCacheOffset.toISOString(); | ||
| eventsKuery = nodeBuilder.or([ | ||
| nodeBuilder.is('maintenance-window.attributes.events', startDateISO), |
There was a problem hiding this comment.
Confused by this. We're generating a new date in this method, and then searching for it explicitly in the SO's? I would think that would literally never work.
I see the change from using date ranges (which makes sense) to the is check was done here: #157112
Can we find out what's going on? Either it's a bug, or needs a comment :-)
There was a problem hiding this comment.
I was confused as well but I tested it and it does work. I think because events is mapped as a date_range it just works? cc @JiaweiWu do you know?
There was a problem hiding this comment.
oooooh! Yup! Thx, it is a date_range and you can do term queries against them. Either forgot that, or TIL
https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html
| ); | ||
|
|
||
| // wait so cache expires | ||
| await setTimeoutAsync(10000); |
There was a problem hiding this comment.
Feels like we should use a common constant instead of literal. Will be easier to fix later :-)
|
@elasticmachine run docs-build |
dominiqueclarke
left a comment
There was a problem hiding this comment.
obs-ux-management changes LGTM
|
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ymao1 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
… alerts during rule execution and caching loaded maintenance windows (elastic#192573) Resolves elastic#184324 ## Summary This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests. ## To Verify 1. Add some logging to x-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts to indicate when windows are being fetched and when they're returning from the cache. 2. Create and run some rules in different spaces with and without alerts to see that the maintenance windows are only loaded when there are alerts and that the windows are returned from the cache when the cache has not expired. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 93414a6)
…re are alerts during rule execution and caching loaded maintenance windows (#192573) (#194191) # Backport This will backport the following commits from `main` to `8.x`: - [[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)](#192573) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ying Mao","email":"ying.mao@elastic.co"},"sourceCommit":{"committedDate":"2024-09-26T12:59:36Z","message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Feature:Alerting","release_note:skip","Team:ResponseOps","v9.0.0","backport:prev-minor","ci:project-deploy-observability","Team:obs-ux-management","v8.16.0"],"number":192573,"url":"https://github.com/elastic/kibana/pull/192573","mergeCommit":{"message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/192573","number":192573,"mergeCommit":{"message":"[Response Ops][Alerting] Only load maintenance windows when there are alerts during rule execution and caching loaded maintenance windows (#192573)\n\nResolves https://github.com/elastic/kibana/issues/184324\r\n\r\n## Summary\r\n\r\nThis PR moves the loading of maintenance windows further down in rule\r\nexecution so maintenance windows are only loaded when a rule execution\r\ngenerates alerts. Also caches maintenance windows per space to reduce\r\nthe number of requests.\r\n\r\n## To Verify\r\n\r\n1. Add some logging to\r\nx-pack/plugins/alerting/server/task_runner/maintenance_windows/maintenance_windows_service.ts\r\nto indicate when windows are being fetched and when they're returning\r\nfrom the cache.\r\n2. Create and run some rules in different spaces with and without alerts\r\nto see that the maintenance windows are only loaded when there are\r\nalerts and that the windows are returned from the cache when the cache\r\nhas not expired.\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>\r\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"93414a672c2767b035110fa2d811cc040af57727"}},{"branch":"8.x","label":"v8.16.0","labelRegex":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
Resolves #184324
Summary
This PR moves the loading of maintenance windows further down in rule execution so maintenance windows are only loaded when a rule execution generates alerts. Also caches maintenance windows per space to reduce the number of requests.
To Verify